Date: Tue, 05 Nov 1996 00:25:59 GMT
Server: NCSA/1.5
Content-type: text/html
Last-modified: Mon, 05 Sep 1994 21:28:42 GMT
Content-length: 2639

<html>
<head>
<title> Abstract of Nick Street's Thesis </title>
</head>

<body>

<h1> CANCER DIAGNOSIS AND PROGNOSIS VIA 
LINEAR-PROGRAMMING-BASED MACHINE LEARNING 
<p>
W. Nick Street<br>
Under the supervision of Professor Olvi L. Mangasarian
at the University of Wisconsin-Madison
</h1>
<br>
The purpose of this research is twofold.  The first purpose is the
development of 
machine learning methods based on linear programming.  
These advances come in the form of both novel learning algorithms and
generalization-improvement approaches that are applicable to a wide
range of learning algorithms.
The second aim is the application of these algorithms, along with
ideas from 
the fields of statistics and image processing, to problems arising in
the diagnosis and treatment of breast cancer.  
The first chapter provides background into the relevant areas of
research, most importantly machine learning and
breast cancer diagnosis and prognosis, and describes previous work
that has united the two fields.
<p>
The first major contribution of this research is an
automated cytological analysis system that we call <b>Xcyt</b>.  This
system includes a partially automatic image segmentation
facility for isolating cell nuclei from a digital image.
<b>Xcyt</b> allows computation of various size, shape and texture
features of these nuclei.  A linear separator distinguishes benign
cases from malignant cases based on 569 training examples.  <b>Xcyt</b>
also provides an estimate of the probability of malignancy for each
case.  The <b>Xcyt</b> system is currently in use at the University of
Wisconsin Hospitals.
<p>
We also address the problem of cancer prognosis, that is, determining
when a cancer is likely to recur.  A new learning method, the
recurrence surface approximation (RSA), is introduced for predicting
the time of recurrence.  This procedure uses linear programming to
predict recurrence from right-censored input data.  Various extensions
and variations of RSA are also explored, including a separation-based
procedure that we call implicit RSA (IRSA).
<p>
The topic of improving the generalization of a learning system is
first addressed in the context of RSA, where we introduce a 
procedure for choosing the most relevant input features for
use the the predictive model.  The final two chapters also describe
generalization-enhancement methods.  The first is mock generalization,
which incorporates a tuning set into the optimization process.  The
second is banded approximation, which explicitly defines a tolerance
band around the predictive surface in order to avoid overfitting the
training data.

</body>
</html>

